Supply Your Arguments

  • For clarity, I’ve written out what each of the parameters are in separate sections. At the end, we will feed all of these arguments into the eQTL_boxplots function and run it.

1. snp_list

  • Supply a list of SNPs (using thier rsIDs) that you want to plot.

2. eQTL_SS_paths

  • Save the root directory as a variable so you don’t have to write it out each time.
  • Supply list of paths to each of the summary statistics (SS) files you wish to use (one per condition).
## [1] "Data/eQTL/Fairfax_2014/CD14/LRRK2/LRRK2_Fairfax_CD14.txt"  
## [2] "Data/eQTL/Fairfax_2014/IFN/LRRK2/LRRK2_Fairfax_IFN.txt"    
## [3] "Data/eQTL/Fairfax_2014/LPS2/LRRK2/LRRK2_Fairfax_LPS2.txt"  
## [4] "Data/eQTL/Fairfax_2014/LPS24/LRRK2/LRRK2_Fairfax_LPS24.txt"

3. expression_paths

  • Now we do the same for the gene expression files.
  • Currently, eQTL_boxplots assumes that each expression file is tab-delimited and contains the following columns:
    • [1] PROBE_ID (containing the probe ID, e.g. ILMN_1802380)
    • [2] 1 -> n (One column per subject, where the column name correpsonds to the subject’s ID. In fairfax, the subject IDs are just numbers (1-421), but these could be any unique identifier)
## [1] "Data/eQTL/Fairfax_2014/CD14/CD14.47231.414.b.txt"  
## [2] "Data/eQTL/Fairfax_2014/IFN/IFN.47231.367.b.txt"    
## [3] "Data/eQTL/Fairfax_2014/LPS2/LPS2.47231.261.b.txt"  
## [4] "Data/eQTL/Fairfax_2014/LPS24/LPS24.47231.322.b.txt"

4. probe_path

  • Since eQTL SS and expression files often have the probe listed (instead of the gene symbol it/s targetting ), you’ll need to provide a probe mapping file with two columns:
    • [1] GENE (containing gene symbols)
    • [2] PROBE_ID (containing the corresponding probe ID, e.g. ILMN_1802380).
  • Often, genes will have multiple probes per gene. By default, eQTL_boxplots only uses the probe with the highest average expression across all samples.

5. gene

  • Specify the gene (or list of genes) you want to get the expression data for. eQTL_boxplots will use the probe mapping file to pull out the relevant probe(s) for this gene.
  • In this case, the probe mapping file uses offical HGNC gene symbols, but you should use whatever format your probe mapping file uses (e.g. ENSEMBL IDs).

6. genotype_path

  • The genotype file is pretty huge, so you want to work with a subset of it in R. If you already have subset the data to just the SNPs you’re interested in…great! Go ahead and use the path to the subsetted file.
  • If you only have the full genotype file, fret not! Simply supply the path to the full genotype data instead AND set subset_genotype_file = T. It will automatically search the genotype file for the SNPs of interest and extract them.
    • NOTE: Even though this uses grep, this process is still fairly slow and can take several minutes to extract the subset.

7. .fam_path

  • Annoyingly, the subject IDs are not included in the genotype file and must be pulled from the corresponding .fam file that was also generated by plink. The genotype subject IDs are not necessarily in alpha-numeric order (1,2,3,4,etc.) so if you assume this your data is going to be all mixed up.
  • Instead, eQTL_boxplots takes the .fam file and correctly labels the genotype subject IDs for you.

8. plotting parameters

  • show_plot : You can specify whether you want to actually show the plot ( = TRUE).
  • SS_annotations : You can choose whether to add annotations (Beta, P, FDR) to each boxplot (= TRUE).
  • interact : Make the plot interactive using plotly::ggplotly! (= TRUE). [NOTE: under construction…]

Run eQTL_boxplots Function

  • Time to use your arguments to run eQTL_boxplots! In this example, I have supplied 3 SNPs and eQTL data from 4 different conditions, so 12 eQTL boxplots will be generated.
  • If available in the eQTL_SS_paths, eQTL_boxplots will also annotate each boxplot with:
    • Beta (effect size)
    • P (P-value)
    • FDR (False Discovery Rate)
## [1] ""
## [1] "+ Processsing Expression data"
## [1] "++ Extracting probe info"
## [1] "++ CD14"
## [1] "++ IFN"
## [1] "++ LPS2"
## [1] "++ LPS24"
## [1] ""
## [1] "+ Processing Summary Stats data"
## [1] ""
## [1] "+ Processing Genotype data"
## Warning in melt.data.table(., geno_subset, id.vars = c("CHR", "SNP",
## "POS", : 'measure.vars' [289, 290, 292, 293, ...] are not all of the same
## type. By order of hierarchy, the molten data value column will be of type
## 'double'. All measure variables not of type 'double' will be coerced too.
## Check DETAILS in ?melt.data.table for more on coercion.
## [1] ""
## [1] "+ Merging Summary Stats, Genotype, and Expression data"
## [1] ""
## [1] "+ Plotting eQTLs"

Output Data

  • eQTL_boxplots also produces a data.table containing all the data SS, expression, and genotype data used to create the plots.
  • Each row a data point for a given SNP, within a given condition, within a given individual.